Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 417 | 435 |
| Missing cells (%) | 7.8% | 8.1% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Age has 74 (16.6%) missing values | Age has 90 (20.2%) missing values | Missing |
Cabin has 343 (76.9%) missing values | Cabin has 345 (77.4%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 302 (67.7%) zeros | SibSp has 305 (68.4%) zeros | Zeros |
Parch has 346 (77.6%) zeros | Parch has 340 (76.2%) zeros | Zeros |
Fare has 9 (2.0%) zeros | Fare has 5 (1.1%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2024-02-26 14:53:39.358022 | 2024-02-26 14:53:43.744644 |
| Analysis finished | 2024-02-26 14:53:43.743559 | 2024-02-26 14:53:47.704923 |
| Duration | 4.39 seconds | 3.96 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 447.57623 | 445.80493 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| Maximum | 891 | 889 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| 5-th percentile | 48.5 | 33.25 |
| Q1 | 232.5 | 222.5 |
| median | 456 | 457.5 |
| Q3 | 664.75 | 665.5 |
| 95-th percentile | 844.25 | 850 |
| Maximum | 891 | 889 |
| Range | 890 | 888 |
| Interquartile range (IQR) | 432.25 | 443 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 257.3442 | 260.46221 |
| Coefficient of variation (CV) | 0.57497289 | 0.58425152 |
| Kurtosis | -1.1901699 | -1.1936929 |
| Mean | 447.57623 | 445.80493 |
| Median Absolute Deviation (MAD) | 215.5 | 222.5 |
| Skewness | -0.005753219 | -0.036756423 |
| Sum | 199619 | 198829 |
| Variance | 66226.038 | 67840.562 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 339 | 1 | 0.2% |
| 42 | 1 | 0.2% |
| 399 | 1 | 0.2% |
| 112 | 1 | 0.2% |
| 743 | 1 | 0.2% |
| 209 | 1 | 0.2% |
| 43 | 1 | 0.2% |
| 275 | 1 | 0.2% |
| 834 | 1 | 0.2% |
| 595 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 46 | 1 | 0.2% |
| 761 | 1 | 0.2% |
| 42 | 1 | 0.2% |
| 820 | 1 | 0.2% |
| 28 | 1 | 0.2% |
| 307 | 1 | 0.2% |
| 256 | 1 | 0.2% |
| 364 | 1 | 0.2% |
| 202 | 1 | 0.2% |
| 314 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 14 | 1 | |
| 17 | 1 | |
| 21 | 1 | |
| 22 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 13 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 13 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 14 | 1 | |
| 17 | 1 | |
| 21 | 1 | |
| 22 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 2 | 2 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 1 | 0 |
| 2nd row | 0 | 1 |
| 3rd row | 1 | 0 |
| 4th row | 0 | 0 |
| 5th row | 1 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 267 | |
| 1 | 179 |
| Value | Count | Frequency (%) |
| 0 | 277 | |
| 1 | 169 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 267 | |
| 1 | 179 |
| Value | Count | Frequency (%) |
| 0 | 277 | |
| 1 | 169 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 267 | |
| 1 | 179 |
| Value | Count | Frequency (%) |
| 0 | 277 | |
| 1 | 169 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 267 | |
| 1 | 179 |
| Value | Count | Frequency (%) |
| 0 | 277 | |
| 1 | 169 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 267 | |
| 1 | 179 |
| Value | Count | Frequency (%) |
| 0 | 277 | |
| 1 | 169 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 267 | |
| 1 | 179 |
| Value | Count | Frequency (%) |
| 0 | 277 | |
| 1 | 169 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3 | 3 |
| 2nd row | 1 | 2 |
| 3rd row | 2 | 3 |
| 4th row | 1 | 3 |
| 5th row | 2 | 2 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 105 | |
| 2 | 105 |
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 104 | |
| 2 | 90 | 20.2% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 105 | |
| 2 | 105 |
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 104 | |
| 2 | 90 | 20.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 105 | |
| 2 | 105 |
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 104 | |
| 2 | 90 | 20.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 105 | |
| 2 | 105 |
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 104 | |
| 2 | 90 | 20.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 105 | |
| 2 | 105 |
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 104 | |
| 2 | 90 | 20.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 105 | |
| 2 | 105 |
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 104 | |
| 2 | 90 | 20.2% |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 67 | 67 |
| Median length | 49 | 50 |
| Mean length | 26.730942 | 27.107623 |
| Min length | 12 | 12 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 11922 | 12090 |
| Distinct characters | 59 | 60 |
| Distinct categories | 7 | 7 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Dahl, Mr. Karl Edwart | Rogers, Mr. William John |
| 2nd row | Chaffee, Mr. Herbert Fuller | Mellors, Mr. William John |
| 3rd row | Cameron, Miss. Clear Annie | Goodwin, Mrs. Frederick (Augusta Tyler) |
| 4th row | Giglio, Mr. Victor | Olsen, Mr. Karl Siegwart Andreas |
| 5th row | Richards, Master. George Sibley | Hewlett, Mrs. (Mary D Kingcome) |
| Value | Count | Frequency (%) |
| mr | 254 | 14.0% |
| miss | 95 | 5.3% |
| mrs | 66 | 3.7% |
| william | 30 | 1.7% |
| john | 25 | 1.4% |
| henry | 20 | 1.1% |
| master | 18 | 1.0% |
| james | 14 | 0.8% |
| mary | 12 | 0.7% |
| george | 11 | 0.6% |
| Other values (878) | 1263 |
| Value | Count | Frequency (%) |
| mr | 265 | 14.5% |
| miss | 88 | 4.8% |
| mrs | 66 | 3.6% |
| william | 30 | 1.6% |
| master | 20 | 1.1% |
| john | 19 | 1.0% |
| henry | 17 | 0.9% |
| thomas | 17 | 0.9% |
| james | 15 | 0.8% |
| edward | 13 | 0.7% |
| Other values (890) | 1281 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1362 | 11.4% | |
| r | 973 | 8.2% |
| e | 845 | 7.1% |
| a | 790 | 6.6% |
| i | 666 | 5.6% |
| n | 638 | 5.4% |
| s | 638 | 5.4% |
| M | 560 | 4.7% |
| l | 541 | 4.5% |
| o | 509 | 4.3% |
| Other values (49) | 4400 |
| Value | Count | Frequency (%) |
| 1387 | 11.5% | |
| r | 967 | 8.0% |
| e | 868 | 7.2% |
| a | 846 | 7.0% |
| n | 652 | 5.4% |
| s | 643 | 5.3% |
| i | 635 | 5.3% |
| M | 573 | 4.7% |
| o | 514 | 4.3% |
| l | 513 | 4.2% |
| Other values (50) | 4492 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7660 | |
| Uppercase Letter | 1816 | 15.2% |
| Space Separator | 1362 | 11.4% |
| Other Punctuation | 937 | 7.9% |
| Open Punctuation | 70 | 0.6% |
| Close Punctuation | 70 | 0.6% |
| Dash Punctuation | 7 | 0.1% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 7754 | |
| Uppercase Letter | 1846 | 15.3% |
| Space Separator | 1387 | 11.5% |
| Other Punctuation | 947 | 7.8% |
| Open Punctuation | 75 | 0.6% |
| Close Punctuation | 75 | 0.6% |
| Dash Punctuation | 6 | < 0.1% |
Most frequent character per category
Space Separator
| Value | Count | Frequency (%) |
| 1362 |
| Value | Count | Frequency (%) |
| 1387 |
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 973 | |
| e | 845 | |
| a | 790 | |
| i | 666 | |
| n | 638 | |
| s | 638 | |
| l | 541 | 7.1% |
| o | 509 | 6.6% |
| t | 324 | 4.2% |
| h | 254 | 3.3% |
| Other values (16) | 1482 |
| Value | Count | Frequency (%) |
| r | 967 | |
| e | 868 | |
| a | 846 | |
| n | 652 | |
| s | 643 | |
| i | 635 | |
| o | 514 | 6.6% |
| l | 513 | 6.6% |
| t | 341 | 4.4% |
| h | 267 | 3.4% |
| Other values (16) | 1508 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 560 | |
| A | 119 | 6.6% |
| J | 113 | 6.2% |
| H | 103 | 5.7% |
| S | 88 | 4.8% |
| E | 86 | 4.7% |
| B | 78 | 4.3% |
| C | 75 | 4.1% |
| W | 69 | 3.8% |
| L | 66 | 3.6% |
| Other values (15) | 459 |
| Value | Count | Frequency (%) |
| M | 573 | |
| A | 125 | 6.8% |
| J | 111 | 6.0% |
| H | 101 | 5.5% |
| S | 89 | 4.8% |
| C | 85 | 4.6% |
| E | 83 | 4.5% |
| B | 73 | 4.0% |
| W | 69 | 3.7% |
| L | 68 | 3.7% |
| Other values (15) | 469 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 447 | |
| , | 446 | |
| " | 42 | 4.5% |
| ' | 2 | 0.2% |
| Value | Count | Frequency (%) |
| . | 447 | |
| , | 446 | |
| " | 48 | 5.1% |
| ' | 5 | 0.5% |
| / | 1 | 0.1% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 70 |
| Value | Count | Frequency (%) |
| ( | 75 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 70 |
| Value | Count | Frequency (%) |
| ) | 75 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 7 |
| Value | Count | Frequency (%) |
| - | 6 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9476 | |
| Common | 2446 | 20.5% |
| Value | Count | Frequency (%) |
| Latin | 9600 | |
| Common | 2490 | 20.6% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1362 | ||
| . | 447 | 18.3% |
| , | 446 | 18.2% |
| ( | 70 | 2.9% |
| ) | 70 | 2.9% |
| " | 42 | 1.7% |
| - | 7 | 0.3% |
| ' | 2 | 0.1% |
| Value | Count | Frequency (%) |
| 1387 | ||
| . | 447 | 18.0% |
| , | 446 | 17.9% |
| ( | 75 | 3.0% |
| ) | 75 | 3.0% |
| " | 48 | 1.9% |
| - | 6 | 0.2% |
| ' | 5 | 0.2% |
| / | 1 | < 0.1% |
Latin
| Value | Count | Frequency (%) |
| r | 973 | 10.3% |
| e | 845 | 8.9% |
| a | 790 | 8.3% |
| i | 666 | 7.0% |
| n | 638 | 6.7% |
| s | 638 | 6.7% |
| M | 560 | 5.9% |
| l | 541 | 5.7% |
| o | 509 | 5.4% |
| t | 324 | 3.4% |
| Other values (41) | 2992 |
| Value | Count | Frequency (%) |
| r | 967 | 10.1% |
| e | 868 | 9.0% |
| a | 846 | 8.8% |
| n | 652 | 6.8% |
| s | 643 | 6.7% |
| i | 635 | 6.6% |
| M | 573 | 6.0% |
| o | 514 | 5.4% |
| l | 513 | 5.3% |
| t | 341 | 3.6% |
| Other values (41) | 3048 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11922 |
| Value | Count | Frequency (%) |
| ASCII | 12090 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1362 | 11.4% | |
| r | 973 | 8.2% |
| e | 845 | 7.1% |
| a | 790 | 6.6% |
| i | 666 | 5.6% |
| n | 638 | 5.4% |
| s | 638 | 5.4% |
| M | 560 | 4.7% |
| l | 541 | 4.5% |
| o | 509 | 4.3% |
| Other values (49) | 4400 |
| Value | Count | Frequency (%) |
| 1387 | 11.5% | |
| r | 967 | 8.0% |
| e | 868 | 7.2% |
| a | 846 | 7.0% |
| n | 652 | 5.4% |
| s | 643 | 5.3% |
| i | 635 | 5.3% |
| M | 573 | 4.7% |
| o | 514 | 4.3% |
| l | 513 | 4.2% |
| Other values (50) | 4492 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.7264574 | 4.6950673 |
| Min length | 4 | 4 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2108 | 2094 |
| Distinct characters | 5 | 5 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | male |
| 2nd row | male | male |
| 3rd row | female | female |
| 4th row | male | male |
| 5th row | male | female |
Common Values
| Value | Count | Frequency (%) |
| male | 284 | |
| female | 162 |
| Value | Count | Frequency (%) |
| male | 291 | |
| female | 155 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 284 | |
| female | 162 |
| Value | Count | Frequency (%) |
| male | 291 | |
| female | 155 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 608 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 162 | 7.7% |
| Value | Count | Frequency (%) |
| e | 601 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 155 | 7.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2108 |
| Value | Count | Frequency (%) |
| Lowercase Letter | 2094 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 608 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 162 | 7.7% |
| Value | Count | Frequency (%) |
| e | 601 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 155 | 7.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2108 |
| Value | Count | Frequency (%) |
| Latin | 2094 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 608 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 162 | 7.7% |
| Value | Count | Frequency (%) |
| e | 601 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 155 | 7.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2108 |
| Value | Count | Frequency (%) |
| ASCII | 2094 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 608 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 162 | 7.7% |
| Value | Count | Frequency (%) |
| e | 601 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 155 | 7.4% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 79 | 73 |
| Distinct (%) | 21.2% | 20.5% |
| Missing | 74 | 90 |
| Missing (%) | 16.6% | 20.2% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.730054 | 30.452725 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.67 | 0.42 |
| Maximum | 74 | 71 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.67 | 0.42 |
| 5-th percentile | 4.55 | 5 |
| Q1 | 20.375 | 21 |
| median | 29 | 29 |
| Q3 | 39 | 39 |
| 95-th percentile | 53.45 | 56 |
| Maximum | 74 | 71 |
| Range | 73.33 | 70.58 |
| Interquartile range (IQR) | 18.625 | 18 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.105994 | 14.422212 |
| Coefficient of variation (CV) | 0.47446916 | 0.47359349 |
| Kurtosis | -0.04423976 | 0.028659718 |
| Mean | 29.730054 | 30.452725 |
| Median Absolute Deviation (MAD) | 9 | 9 |
| Skewness | 0.24681164 | 0.32642933 |
| Sum | 11059.58 | 10841.17 |
| Variance | 198.97906 | 208.00021 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 21 | 16 | 3.6% |
| 19 | 16 | 3.6% |
| 24 | 15 | 3.4% |
| 29 | 13 | 2.9% |
| 30 | 12 | 2.7% |
| 22 | 12 | 2.7% |
| 18 | 12 | 2.7% |
| 36 | 12 | 2.7% |
| 28 | 11 | 2.5% |
| 32 | 11 | 2.5% |
| Other values (69) | 242 | |
| (Missing) | 74 | 16.6% |
| Value | Count | Frequency (%) |
| 22 | 16 | 3.6% |
| 18 | 14 | 3.1% |
| 19 | 14 | 3.1% |
| 24 | 13 | 2.9% |
| 29 | 11 | 2.5% |
| 27 | 11 | 2.5% |
| 25 | 11 | 2.5% |
| 26 | 10 | 2.2% |
| 36 | 10 | 2.2% |
| 16 | 10 | 2.2% |
| Other values (63) | 236 | |
| (Missing) | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 2 | 0.4% |
| 1 | 3 | |
| 2 | 5 | |
| 3 | 3 | |
| 4 | 4 | |
| 5 | 3 | |
| 6 | 2 | 0.4% |
| 7 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 3 | |
| 3 | 2 | 0.4% |
| 4 | 6 | |
| 5 | 4 | |
| 6 | 2 | 0.4% |
| 8 | 3 | |
| 9 | 4 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 3 | |
| 3 | 2 | 0.4% |
| 4 | 6 | |
| 5 | 4 | |
| 6 | 2 | 0.4% |
| 8 | 3 | |
| 9 | 4 |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 2 | 0.4% |
| 1 | 3 | |
| 2 | 5 | |
| 3 | 3 | |
| 4 | 4 | |
| 5 | 3 | |
| 6 | 2 | 0.4% |
| 7 | 1 | 0.2% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.50224215 | 0.55829596 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 302 | 305 |
| Zeros (%) | 67.7% | 68.4% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2 | 3 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.0139452 | 1.2029779 |
| Coefficient of variation (CV) | 2.0188373 | 2.1547314 |
| Kurtosis | 16.870595 | 16.759713 |
| Mean | 0.50224215 | 0.55829596 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.5084051 | 3.6627275 |
| Sum | 224 | 249 |
| Variance | 1.0280848 | 1.4471557 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 302 | |
| 1 | 109 | 24.4% |
| 2 | 15 | 3.4% |
| 4 | 9 | 2.0% |
| 3 | 6 | 1.3% |
| 5 | 3 | 0.7% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 305 | |
| 1 | 99 | 22.2% |
| 2 | 16 | 3.6% |
| 4 | 9 | 2.0% |
| 3 | 9 | 2.0% |
| 8 | 5 | 1.1% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 302 | |
| 1 | 109 | 24.4% |
| 2 | 15 | 3.4% |
| 3 | 6 | 1.3% |
| 4 | 9 | 2.0% |
| 5 | 3 | 0.7% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 305 | |
| 1 | 99 | 22.2% |
| 2 | 16 | 3.6% |
| 3 | 9 | 2.0% |
| 4 | 9 | 2.0% |
| 5 | 3 | 0.7% |
| 8 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 305 | |
| 1 | 99 | 22.2% |
| 2 | 16 | 3.6% |
| 3 | 9 | 2.0% |
| 4 | 9 | 2.0% |
| 5 | 3 | 0.7% |
| 8 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 302 | |
| 1 | 109 | 24.4% |
| 2 | 15 | 3.4% |
| 3 | 6 | 1.3% |
| 4 | 9 | 2.0% |
| 5 | 3 | 0.7% |
| 8 | 2 | 0.4% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 6 | 7 |
| Distinct (%) | 1.3% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.35201794 | 0.40134529 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 5 | 6 |
| Zeros | 346 | 340 |
| Zeros (%) | 77.6% | 76.2% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 5 | 6 |
| Range | 5 | 6 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.75807738 | 0.88832527 |
| Coefficient of variation (CV) | 2.1535192 | 2.2133691 |
| Kurtosis | 8.6256736 | 11.399706 |
| Mean | 0.35201794 | 0.40134529 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.6398298 | 3.0441951 |
| Sum | 157 | 179 |
| Variance | 0.57468131 | 0.78912178 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 346 | |
| 1 | 55 | 12.3% |
| 2 | 39 | 8.7% |
| 5 | 2 | 0.4% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 340 | |
| 1 | 59 | 13.2% |
| 2 | 36 | 8.1% |
| 5 | 5 | 1.1% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 346 | |
| 1 | 55 | 12.3% |
| 2 | 39 | 8.7% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 340 | |
| 1 | 59 | 13.2% |
| 2 | 36 | 8.1% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 5 | 5 | 1.1% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 340 | |
| 1 | 59 | 13.2% |
| 2 | 36 | 8.1% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 5 | 5 | 1.1% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 346 | |
| 1 | 55 | 12.3% |
| 2 | 39 | 8.7% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 371 | 375 |
| Distinct (%) | 83.2% | 84.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.7847534 | 6.8565022 |
| Min length | 3 | 3 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 3026 | 3058 |
| Distinct characters | 35 | 35 |
| Distinct categories | 5 | 5 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 312 | 327 ? |
| Unique (%) | 70.0% | 73.3% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 7598 | S.C./A.4. 23567 |
| 2nd row | W.E.P. 5734 | SW/PP 751 |
| 3rd row | F.C.C. 13528 | CA 2144 |
| 4th row | PC 17593 | 4579 |
| 5th row | 29106 | 248706 |
| Value | Count | Frequency (%) |
| pc | 33 | 5.7% |
| c.a | 15 | 2.6% |
| a/5 | 12 | 2.1% |
| 2 | 6 | 1.0% |
| ston/o | 6 | 1.0% |
| w./c | 5 | 0.9% |
| sc/paris | 5 | 0.9% |
| c | 5 | 0.9% |
| ca | 5 | 0.9% |
| soton/o.q | 4 | 0.7% |
| Other values (392) | 479 |
| Value | Count | Frequency (%) |
| pc | 29 | 5.0% |
| c.a | 18 | 3.1% |
| ca | 10 | 1.7% |
| a/5 | 7 | 1.2% |
| w./c | 7 | 1.2% |
| 2 | 6 | 1.0% |
| ston/o | 6 | 1.0% |
| sc/paris | 6 | 1.0% |
| 347088 | 5 | 0.9% |
| soton/o.q | 5 | 0.9% |
| Other values (394) | 476 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 354 | |
| 1 | 335 | |
| 2 | 287 | |
| 7 | 253 | |
| 4 | 235 | 7.8% |
| 6 | 220 | 7.3% |
| 5 | 201 | 6.6% |
| 0 | 197 | 6.5% |
| 9 | 161 | 5.3% |
| 8 | 151 | 5.0% |
| Other values (25) | 632 |
| Value | Count | Frequency (%) |
| 3 | 376 | |
| 1 | 330 | |
| 2 | 287 | |
| 7 | 249 | 8.1% |
| 4 | 229 | 7.5% |
| 6 | 210 | 6.9% |
| 0 | 203 | 6.6% |
| 5 | 199 | 6.5% |
| 9 | 171 | 5.6% |
| 8 | 136 | 4.4% |
| Other values (25) | 668 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2394 | |
| Uppercase Letter | 324 | 10.7% |
| Other Punctuation | 163 | 5.4% |
| Space Separator | 129 | 4.3% |
| Lowercase Letter | 16 | 0.5% |
| Value | Count | Frequency (%) |
| Decimal Number | 2390 | |
| Uppercase Letter | 350 | 11.4% |
| Other Punctuation | 173 | 5.7% |
| Space Separator | 129 | 4.2% |
| Lowercase Letter | 16 | 0.5% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 354 | |
| 1 | 335 | |
| 2 | 287 | |
| 7 | 253 | |
| 4 | 235 | |
| 6 | 220 | |
| 5 | 201 | |
| 0 | 197 | |
| 9 | 161 | |
| 8 | 151 |
| Value | Count | Frequency (%) |
| 3 | 376 | |
| 1 | 330 | |
| 2 | 287 | |
| 7 | 249 | |
| 4 | 229 | |
| 6 | 210 | |
| 0 | 203 | |
| 5 | 199 | |
| 9 | 171 | |
| 8 | 136 | 5.7% |
Space Separator
| Value | Count | Frequency (%) |
| 129 |
| Value | Count | Frequency (%) |
| 129 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 113 | |
| / | 50 |
| Value | Count | Frequency (%) |
| . | 120 | |
| / | 53 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 83 | |
| P | 55 | |
| A | 45 | |
| O | 44 | |
| S | 33 | 10.2% |
| N | 17 | 5.2% |
| T | 15 | 4.6% |
| W | 7 | 2.2% |
| Q | 6 | 1.9% |
| F | 5 | 1.5% |
| Other values (6) | 14 | 4.3% |
| Value | Count | Frequency (%) |
| C | 87 | |
| O | 51 | |
| A | 48 | |
| P | 48 | |
| S | 39 | |
| N | 20 | 5.7% |
| T | 18 | 5.1% |
| W | 11 | 3.1% |
| Q | 8 | 2.3% |
| I | 5 | 1.4% |
| Other values (6) | 15 | 4.3% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 4 | |
| s | 4 | |
| r | 3 | |
| i | 3 | |
| l | 1 | 6.2% |
| e | 1 | 6.2% |
| Value | Count | Frequency (%) |
| a | 4 | |
| s | 4 | |
| i | 3 | |
| r | 3 | |
| l | 1 | 6.2% |
| e | 1 | 6.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2686 | |
| Latin | 340 | 11.2% |
| Value | Count | Frequency (%) |
| Common | 2692 | |
| Latin | 366 | 12.0% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 354 | |
| 1 | 335 | |
| 2 | 287 | |
| 7 | 253 | |
| 4 | 235 | |
| 6 | 220 | |
| 5 | 201 | |
| 0 | 197 | |
| 9 | 161 | |
| 8 | 151 | |
| Other values (3) | 292 |
| Value | Count | Frequency (%) |
| 3 | 376 | |
| 1 | 330 | |
| 2 | 287 | |
| 7 | 249 | |
| 4 | 229 | |
| 6 | 210 | |
| 0 | 203 | |
| 5 | 199 | |
| 9 | 171 | |
| 8 | 136 | 5.1% |
| Other values (3) | 302 |
Latin
| Value | Count | Frequency (%) |
| C | 83 | |
| P | 55 | |
| A | 45 | |
| O | 44 | |
| S | 33 | 9.7% |
| N | 17 | 5.0% |
| T | 15 | 4.4% |
| W | 7 | 2.1% |
| Q | 6 | 1.8% |
| F | 5 | 1.5% |
| Other values (12) | 30 | 8.8% |
| Value | Count | Frequency (%) |
| C | 87 | |
| O | 51 | |
| A | 48 | |
| P | 48 | |
| S | 39 | |
| N | 20 | 5.5% |
| T | 18 | 4.9% |
| W | 11 | 3.0% |
| Q | 8 | 2.2% |
| I | 5 | 1.4% |
| Other values (12) | 31 | 8.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3026 |
| Value | Count | Frequency (%) |
| ASCII | 3058 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 354 | |
| 1 | 335 | |
| 2 | 287 | |
| 7 | 253 | |
| 4 | 235 | 7.8% |
| 6 | 220 | 7.3% |
| 5 | 201 | 6.6% |
| 0 | 197 | 6.5% |
| 9 | 161 | 5.3% |
| 8 | 151 | 5.0% |
| Other values (25) | 632 |
| Value | Count | Frequency (%) |
| 3 | 376 | |
| 1 | 330 | |
| 2 | 287 | |
| 7 | 249 | 8.1% |
| 4 | 229 | 7.5% |
| 6 | 210 | 6.9% |
| 0 | 203 | 6.6% |
| 5 | 199 | 6.5% |
| 9 | 171 | 5.6% |
| 8 | 136 | 4.4% |
| Other values (25) | 668 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 172 | 177 |
| Distinct (%) | 38.6% | 39.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 33.150223 | 33.447393 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 9 | 5 |
| Zeros (%) | 2.0% | 1.1% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.225 | 7.225 |
| Q1 | 7.925 | 8.05 |
| median | 13.64585 | 14.5 |
| Q3 | 31.359375 | 31.3875 |
| 95-th percentile | 130.2375 | 112.67708 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 23.434375 | 23.3375 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 49.965571 | 52.921279 |
| Coefficient of variation (CV) | 1.5072469 | 1.5822244 |
| Kurtosis | 24.57699 | 33.309281 |
| Mean | 33.150223 | 33.447393 |
| Median Absolute Deviation (MAD) | 6.39585 | 7.25 |
| Skewness | 4.0853779 | 4.8818584 |
| Sum | 14784.999 | 14917.537 |
| Variance | 2496.5583 | 2800.6618 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 13 | 24 | 5.4% |
| 8.05 | 22 | 4.9% |
| 10.5 | 18 | 4.0% |
| 7.75 | 18 | 4.0% |
| 7.8958 | 15 | 3.4% |
| 26 | 14 | 3.1% |
| 7.25 | 10 | 2.2% |
| 0 | 9 | 2.0% |
| 7.775 | 8 | 1.8% |
| 7.925 | 8 | 1.8% |
| Other values (162) | 300 |
| Value | Count | Frequency (%) |
| 8.05 | 23 | 5.2% |
| 10.5 | 19 | 4.3% |
| 7.8958 | 18 | 4.0% |
| 13 | 16 | 3.6% |
| 7.75 | 15 | 3.4% |
| 26 | 14 | 3.1% |
| 7.25 | 9 | 2.0% |
| 26.55 | 8 | 1.8% |
| 8.6625 | 8 | 1.8% |
| 7.925 | 8 | 1.8% |
| Other values (167) | 308 |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 4.0125 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.05 | 3 | 0.7% |
| 7.125 | 1 | 0.2% |
| 7.1417 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 5 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 4 | |
| 7.125 | 1 | 0.2% |
| 7.1417 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 5 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 4 | |
| 7.125 | 1 | 0.2% |
| 7.1417 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 4.0125 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.05 | 3 | 0.7% |
| 7.125 | 1 | 0.2% |
| 7.1417 | 1 | 0.2% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 82 | 91 |
| Distinct (%) | 79.6% | 90.1% |
| Missing | 343 | 345 |
| Missing (%) | 76.9% | 77.4% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.6990291 | 3.5643564 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 381 | 360 |
| Distinct characters | 19 | 19 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 63 | 81 ? |
| Unique (%) | 61.2% | 80.2% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | E31 | E101 |
| 2nd row | B86 | C47 |
| 3rd row | B51 B53 B55 | E77 |
| 4th row | E63 | D19 |
| 5th row | A10 | C126 |
| Value | Count | Frequency (%) |
| d | 3 | 2.4% |
| b98 | 3 | 2.4% |
| b96 | 3 | 2.4% |
| b66 | 2 | 1.6% |
| e67 | 2 | 1.6% |
| f33 | 2 | 1.6% |
| c27 | 2 | 1.6% |
| c25 | 2 | 1.6% |
| c23 | 2 | 1.6% |
| b58 | 2 | 1.6% |
| Other values (84) | 100 |
| Value | Count | Frequency (%) |
| b18 | 2 | 1.7% |
| b51 | 2 | 1.7% |
| c23 | 2 | 1.7% |
| e101 | 2 | 1.7% |
| c123 | 2 | 1.7% |
| d | 2 | 1.7% |
| b55 | 2 | 1.7% |
| b53 | 2 | 1.7% |
| c125 | 2 | 1.7% |
| b49 | 2 | 1.7% |
| Other values (91) | 97 |
Most occurring characters
| Value | Count | Frequency (%) |
| B | 38 | 10.0% |
| C | 34 | 8.9% |
| 1 | 34 | 8.9% |
| 2 | 33 | 8.7% |
| 3 | 29 | 7.6% |
| 6 | 28 | 7.3% |
| 5 | 25 | 6.6% |
| 20 | 5.2% | |
| 0 | 20 | 5.2% |
| D | 19 | 5.0% |
| Other values (9) | 101 |
| Value | Count | Frequency (%) |
| 1 | 38 | |
| C | 34 | 9.4% |
| B | 33 | 9.2% |
| 3 | 32 | 8.9% |
| 5 | 28 | 7.8% |
| 2 | 26 | 7.2% |
| 6 | 20 | 5.6% |
| 4 | 19 | 5.3% |
| D | 19 | 5.3% |
| 8 | 18 | 5.0% |
| Other values (9) | 93 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 238 | |
| Uppercase Letter | 123 | |
| Space Separator | 20 | 5.2% |
| Value | Count | Frequency (%) |
| Decimal Number | 227 | |
| Uppercase Letter | 117 | |
| Space Separator | 16 | 4.4% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 38 | |
| C | 34 | |
| D | 19 | |
| E | 19 | |
| F | 6 | 4.9% |
| G | 3 | 2.4% |
| A | 3 | 2.4% |
| T | 1 | 0.8% |
| Value | Count | Frequency (%) |
| C | 34 | |
| B | 33 | |
| D | 19 | |
| E | 16 | |
| A | 6 | 5.1% |
| F | 6 | 5.1% |
| G | 2 | 1.7% |
| T | 1 | 0.9% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 34 | |
| 2 | 33 | |
| 3 | 29 | |
| 6 | 28 | |
| 5 | 25 | |
| 0 | 20 | |
| 8 | 19 | |
| 4 | 17 | |
| 7 | 17 | |
| 9 | 16 |
| Value | Count | Frequency (%) |
| 1 | 38 | |
| 3 | 32 | |
| 5 | 28 | |
| 2 | 26 | |
| 6 | 20 | |
| 4 | 19 | |
| 8 | 18 | |
| 7 | 17 | |
| 9 | 17 | |
| 0 | 12 | 5.3% |
Space Separator
| Value | Count | Frequency (%) |
| 20 |
| Value | Count | Frequency (%) |
| 16 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 258 | |
| Latin | 123 |
| Value | Count | Frequency (%) |
| Common | 243 | |
| Latin | 117 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| B | 38 | |
| C | 34 | |
| D | 19 | |
| E | 19 | |
| F | 6 | 4.9% |
| G | 3 | 2.4% |
| A | 3 | 2.4% |
| T | 1 | 0.8% |
| Value | Count | Frequency (%) |
| C | 34 | |
| B | 33 | |
| D | 19 | |
| E | 16 | |
| A | 6 | 5.1% |
| F | 6 | 5.1% |
| G | 2 | 1.7% |
| T | 1 | 0.9% |
Common
| Value | Count | Frequency (%) |
| 1 | 34 | |
| 2 | 33 | |
| 3 | 29 | |
| 6 | 28 | |
| 5 | 25 | |
| 20 | ||
| 0 | 20 | |
| 8 | 19 | |
| 4 | 17 | |
| 7 | 17 |
| Value | Count | Frequency (%) |
| 1 | 38 | |
| 3 | 32 | |
| 5 | 28 | |
| 2 | 26 | |
| 6 | 20 | |
| 4 | 19 | |
| 8 | 18 | |
| 7 | 17 | |
| 9 | 17 | |
| 16 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 381 |
| Value | Count | Frequency (%) |
| ASCII | 360 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| B | 38 | 10.0% |
| C | 34 | 8.9% |
| 1 | 34 | 8.9% |
| 2 | 33 | 8.7% |
| 3 | 29 | 7.6% |
| 6 | 28 | 7.3% |
| 5 | 25 | 6.6% |
| 20 | 5.2% | |
| 0 | 20 | 5.2% |
| D | 19 | 5.0% |
| Other values (9) | 101 |
| Value | Count | Frequency (%) |
| 1 | 38 | |
| C | 34 | 9.4% |
| B | 33 | 9.2% |
| 3 | 32 | 8.9% |
| 5 | 28 | 7.8% |
| 2 | 26 | 7.2% |
| 6 | 20 | 5.6% |
| 4 | 19 | 5.3% |
| D | 19 | 5.3% |
| 8 | 18 | 5.0% |
| Other values (9) | 93 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q | 33 |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | S | S |
| 3rd row | S | S |
| 4th row | C | S |
| 5th row | S | S |
Common Values
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 80 | 17.9% |
| Q | 33 | 7.4% |
| Value | Count | Frequency (%) |
| S | 321 | |
| C | 83 | 18.6% |
| Q | 42 | 9.4% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 333 | |
| c | 80 | 17.9% |
| q | 33 | 7.4% |
| Value | Count | Frequency (%) |
| s | 321 | |
| c | 83 | 18.6% |
| q | 42 | 9.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 80 | 17.9% |
| Q | 33 | 7.4% |
| Value | Count | Frequency (%) |
| S | 321 | |
| C | 83 | 18.6% |
| Q | 42 | 9.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 446 |
| Value | Count | Frequency (%) |
| Uppercase Letter | 446 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 80 | 17.9% |
| Q | 33 | 7.4% |
| Value | Count | Frequency (%) |
| S | 321 | |
| C | 83 | 18.6% |
| Q | 42 | 9.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 446 |
| Value | Count | Frequency (%) |
| Latin | 446 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 80 | 17.9% |
| Q | 33 | 7.4% |
| Value | Count | Frequency (%) |
| S | 321 | |
| C | 83 | 18.6% |
| Q | 42 | 9.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 80 | 17.9% |
| Q | 33 | 7.4% |
| Value | Count | Frequency (%) |
| S | 321 | |
| C | 83 | 18.6% |
| Q | 42 | 9.4% |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 338 | 339 | 1 | 3 | Dahl, Mr. Karl Edwart | male | 45.00 | 0 | 0 | 7598 | 8.050 | NaN | S |
| 92 | 93 | 0 | 1 | Chaffee, Mr. Herbert Fuller | male | 46.00 | 1 | 0 | W.E.P. 5734 | 61.175 | E31 | S |
| 211 | 212 | 1 | 2 | Cameron, Miss. Clear Annie | female | 35.00 | 0 | 0 | F.C.C. 13528 | 21.000 | NaN | S |
| 139 | 140 | 0 | 1 | Giglio, Mr. Victor | male | 24.00 | 0 | 0 | PC 17593 | 79.200 | B86 | C |
| 831 | 832 | 1 | 2 | Richards, Master. George Sibley | male | 0.83 | 1 | 1 | 29106 | 18.750 | NaN | S |
| 276 | 277 | 0 | 3 | Lindblom, Miss. Augusta Charlotta | female | 45.00 | 0 | 0 | 347073 | 7.750 | NaN | S |
| 439 | 440 | 0 | 2 | Kvillner, Mr. Johan Henrik Johannesson | male | 31.00 | 0 | 0 | C.A. 18723 | 10.500 | NaN | S |
| 635 | 636 | 1 | 2 | Davis, Miss. Mary | female | 28.00 | 0 | 0 | 237668 | 13.000 | NaN | S |
| 614 | 615 | 0 | 3 | Brocklebank, Mr. William Alfred | male | 35.00 | 0 | 0 | 364512 | 8.050 | NaN | S |
| 346 | 347 | 1 | 2 | Smith, Miss. Marion Elsie | female | 40.00 | 0 | 0 | 31418 | 13.000 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 45 | 46 | 0 | 3 | Rogers, Mr. William John | male | NaN | 0 | 0 | S.C./A.4. 23567 | 8.0500 | NaN | S |
| 226 | 227 | 1 | 2 | Mellors, Mr. William John | male | 19.0 | 0 | 0 | SW/PP 751 | 10.5000 | NaN | S |
| 678 | 679 | 0 | 3 | Goodwin, Mrs. Frederick (Augusta Tyler) | female | 43.0 | 1 | 6 | CA 2144 | 46.9000 | NaN | S |
| 197 | 198 | 0 | 3 | Olsen, Mr. Karl Siegwart Andreas | male | 42.0 | 0 | 1 | 4579 | 8.4042 | NaN | S |
| 15 | 16 | 1 | 2 | Hewlett, Mrs. (Mary D Kingcome) | female | 55.0 | 0 | 0 | 248706 | 16.0000 | NaN | S |
| 280 | 281 | 0 | 3 | Duane, Mr. Frank | male | 65.0 | 0 | 0 | 336439 | 7.7500 | NaN | Q |
| 846 | 847 | 0 | 3 | Sage, Mr. Douglas Bullen | male | NaN | 8 | 2 | CA. 2343 | 69.5500 | NaN | S |
| 638 | 639 | 0 | 3 | Panula, Mrs. Juha (Maria Emilia Ojala) | female | 41.0 | 0 | 5 | 3101295 | 39.6875 | NaN | S |
| 241 | 242 | 1 | 3 | Murphy, Miss. Katherine "Kate" | female | NaN | 1 | 0 | 367230 | 15.5000 | NaN | Q |
| 87 | 88 | 0 | 3 | Slocovski, Mr. Selman Francis | male | NaN | 0 | 0 | SOTON/OQ 392086 | 8.0500 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 333 | 334 | 0 | 3 | Vander Planke, Mr. Leo Edmondus | male | 16.0 | 2 | 0 | 345764 | 18.0000 | NaN | S |
| 13 | 14 | 0 | 3 | Andersson, Mr. Anders Johan | male | 39.0 | 1 | 5 | 347082 | 31.2750 | NaN | S |
| 492 | 493 | 0 | 1 | Molson, Mr. Harry Markland | male | 55.0 | 0 | 0 | 113787 | 30.5000 | C30 | S |
| 399 | 400 | 1 | 2 | Trout, Mrs. William H (Jessie L) | female | 28.0 | 0 | 0 | 240929 | 12.6500 | NaN | S |
| 357 | 358 | 0 | 2 | Funk, Miss. Annie Clemmer | female | 38.0 | 0 | 0 | 237671 | 13.0000 | NaN | S |
| 609 | 610 | 1 | 1 | Shutes, Miss. Elizabeth W | female | 40.0 | 0 | 0 | PC 17582 | 153.4625 | C125 | S |
| 648 | 649 | 0 | 3 | Willey, Mr. Edward | male | NaN | 0 | 0 | S.O./P.P. 751 | 7.5500 | NaN | S |
| 722 | 723 | 0 | 2 | Gillespie, Mr. William Henry | male | 34.0 | 0 | 0 | 12233 | 13.0000 | NaN | S |
| 248 | 249 | 1 | 1 | Beckwith, Mr. Richard Leonard | male | 37.0 | 1 | 1 | 11751 | 52.5542 | D35 | S |
| 567 | 568 | 0 | 3 | Palsson, Mrs. Nils (Alma Cornelia Berglund) | female | 29.0 | 0 | 4 | 349909 | 21.0750 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 299 | 300 | 1 | 1 | Baxter, Mrs. James (Helene DeLaudeniere Chaput) | female | 50.0 | 0 | 1 | PC 17558 | 247.5208 | B58 B60 | C |
| 503 | 504 | 0 | 3 | Laitinen, Miss. Kristina Sofia | female | 37.0 | 0 | 0 | 4135 | 9.5875 | NaN | S |
| 609 | 610 | 1 | 1 | Shutes, Miss. Elizabeth W | female | 40.0 | 0 | 0 | PC 17582 | 153.4625 | C125 | S |
| 582 | 583 | 0 | 2 | Downton, Mr. William James | male | 54.0 | 0 | 0 | 28403 | 26.0000 | NaN | S |
| 765 | 766 | 1 | 1 | Hogeboom, Mrs. John C (Anna Andrews) | female | 51.0 | 1 | 0 | 13502 | 77.9583 | D11 | S |
| 860 | 861 | 0 | 3 | Hansen, Mr. Claus Peter | male | 41.0 | 2 | 0 | 350026 | 14.1083 | NaN | S |
| 785 | 786 | 0 | 3 | Harmer, Mr. Abraham (David Lishin) | male | 25.0 | 0 | 0 | 374887 | 7.2500 | NaN | S |
| 422 | 423 | 0 | 3 | Zimmerman, Mr. Leo | male | 29.0 | 0 | 0 | 315082 | 7.8750 | NaN | S |
| 516 | 517 | 1 | 2 | Lemore, Mrs. (Amelia Milley) | female | 34.0 | 0 | 0 | C.A. 34260 | 10.5000 | F33 | S |
| 339 | 340 | 0 | 1 | Blackwell, Mr. Stephen Weart | male | 45.0 | 0 | 0 | 113784 | 35.5000 | T | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||